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Abstract— Decimal arithmetic has gained high impact on the 
overall performance of today's financial and commercial 
applications. Decimal arithmetic is pervasive in human-oriented 
applications but has a limited use in numerical data processing. 
Decimal additions and multiplication plays a main role in 
decimal operations which is used in any decimal arithmetic 
algorithm. Decimal digit adders and decimal digit multipliers are 
usually the building blocks for higher order decimal adders and 
multipliers. FPGAs provide an efficient hardware platform that 
can be employed for accelerating decimal algorithms. In this 
paper, different designs for two decimal digit adders and one 
decimal digit multiplier are proposed. The proposed designs were 
described, functionally tested, and implemented using VHDL and 
the Xilinx ISE 10.1 targeting Xilinx Vertix-5 XC5VLX30-3 
FPGA. This design is mainly used to decrease the area and as 
well as increase the functioning speed. Implementation results 
and comparison with existing designs are provided. 
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I. INTRODUCTION 

In electronics, an adder or summer is a digital circuit that 
performs addition of numbers. In many computers and other 
kinds of processors, adders are used not only in the arithmetic 
logic unit(s), but also in other parts of the processor, where 
they are used to calculate addresses, table indices, and similar 
operations .Although adders can be constructed for many 
numerical representations, such as binary-coded decimal or 
excess-3, the most common adders operate on binary numbers. 
In cases where two's complement or ones' complement is being 
used to represent negative numbers, it is trivial to modify an 
adder into an adder-subtractor. Other signed number 
representations require a more complex adder. The half adder 
adds two single binary digits A and B. It has two outputs, sum 
(S) and carry (C). The carry signal represents an overflow into 
the next digit of a multi-digit addition. The value of the sum is 
2C + S. The simplest half-adder design, pictured on the right, 
incorporates an XOR gate for S and an AND gate for C. With 
the addition of an OR gate to combine their carry outputs, two 
half adders can be combined to make a full adder. 

A full adder adds binary numbers and accounts for values 
carried in as well as out. A one-bit full adder adds three one-bit 
numbers, often written as A, B, and C in ; A and B are the 
operands, and C m is a bit carried in from the next less 
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significant stage. [21 The full-adder is usually a component in a 
cascade of adders, which add 8, 16, 32, etc. binary numbers. 
The circuit produces two-bit output, output carry and sum 
typically represented by the signals C out and S, Carry-lookahead 
adder to reduce the computation time, engineers devised faster 
ways to add two binary numbers by using carry-lookahead 
adders. They work by creating two signals (P and G) for each 
bit position, based on whether a carry is propagated through 
from a less significant bit position (at least one input is a T), 
generated in that bit position (both inputs are T), or killed in 
that bit position (both inputs are '0'). In most cases, P is simply 
the sum output of a half-adder and G is the carry output of the 
same adder. After P and G are generated the carries for every 
bit position are created. Some advanced carry-lookahead 
architectures are the Manchester carry chain, Brent-Kung 
adder, and the Kogge-Stone adder. 

H. RELATED WORK 

J. M. Rabaey proposed on Digital Integrated Circuits," 
IEEE Trans, on VLSI Systems, 2003. we proposed an area- 
efficient carry select adder by sharing the common Boolean 
logic term. After logic simplification and sharing partial circuit, 
we only need one XOR gate and one inverter gate in each 
summation operation as well as one AND gate and one inverter 
gate in each carry-out operation. 

O. Bedrij proposed on Carry Select Adder A large, 
extremely fast digital adder with sum selection and multiple- 
radix carry is described. Boolean expressions for the operation 
are included. The amount of hardware and the logical delay for 
a 100-bit ripple-carry adder and a carry-select adder are 
compared. The adder system described increases the speed of 
the addition process by reducing the carry-propagation time to 
the minimum commensurate with economical circuit design. 

m. FIELD Programmable Gate Arrays 

Before the advent of programmable logic, custom logic 
circuits were built at the board level using standard 
components, or at the gate level in expensive application- 
specific (custom) integrated circuits. The FPGA is an 
integrated circuit that contains many (64 to over 10,000) 
identical logic cells that can be viewed as standard 
components. Each logic cell can independently take on any 
one of a limited set of personalities. The individual cells are 
interconnected by a matrix of wires and programmable 
switches. A user's design is implemented by specifying the 
simple logic function for each cell and selectively closing the 
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switches in the interconnect matrix. The array of logic cells 
and interconnects form a fabric of basic building blocks for 
logic circuits. Complex designs are created by combining 
these basic blocks to create the desired circuit. 

A field-programmable gate array (FPGA) is an integrated 
circuit designed to be configured by the customer or designer 
after manufacturing hence "field-programmable". The FPGA 
configuration is generally specified using a hardware 
description language (HDL), similar to that used for an 
application-specific integrated circuit (ASIC). FPGAs can be 
used to implement any logical function that an ASIC could 
perform. The ability to update the functionality after shipping, 
partial re-configuration of the portion of the design and the low 
non-recurring engineering costs relative to an ASIC design (not 
withstanding the generally higher unit cost), offer advantages 
for many applications. 

I V . FPGA Architecture for the Challenge 

The FPGA is an array or island-style FPGA. It consists of 
an array of logic blocks and routing channels. Two I/O pads fit 
into the height of one row or the width of one column, as 
shown Fig 1. All the routing channels have the same width 
(number of wires). Each circuit must be mapped into the 
smallest square FPGA that can accommodate it. For example, a 
circuit containing 14 logic blocks and 10 I/O pads would be 
mapped into an FPGA consisting of a 4x4 array of logic 
blocks. Note that three of the twenty benchmark circuits used 
in the FPGA challenge (big key, des, and dsip) are pad-limited 
in this FPGA architecture. 



Routing 
Channel 




Logic Block 



Figure 1 . FPGA structure 

The FPGA logic block consists of a 4-input look-up table 
(LUT), and a flip flop, as shown in Fig 2. There is only one 
output, which can be either the registered or the unregistered 
LUT output. The logic block has four inputs for the LUT and a 
clock input. Since the clock is normally routed via a special- 
purpose dedicated routing network in commercial FPGAs. That 
is, you can completely ignore the clock net, since it is assumed 
to be routed on a special global network. 

A. Logic Block Strucuture 
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Each logic block input pin can connect to any one of the 
wiring segments in the channel adjacent to it. Each logic block 
output pin can connect to any of the wiring segments in the 
channels adjacent to it. (In the usual FPGA terminology, then, 
Fc = the number of tracks per channel, W),as shown in Fig. 3 
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Figure 3. Logic Block Pin to Routing Channel Interconnect 

Similarly, an I/O pad can connect to any one of the wiring 
segments in the channel adjacent to it. For example, an I/O pad 
at the top of the chip can connect to any of the W wires (where 
W is the channel width) in the horizontal channel immediately 
below it. Whenever a vertical and a horizontal channel intersect 
there is a switch box. In this architecture, when a wire enters a 
switch box, there are three programmable switches that allow it 
to connect to three other wires in adjacent channel segments. 
The pattern, or topology, of switches used in this architecture is 
the planar or domain-based switch box topology. In this switch 
box topology, a wire in track number one connects only to 
wires in track number one in adjacent channel segments, wires 
in track number 2 connect only to other wires in track number 
2 and so on. The Fig. 4 illustrates the connections in a switch 
box. 
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Figure 2. Logic Block Structure 
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Figure 4. Switch Box Topology 

Much of the prior research on FPGA routing architectures 
has assumed that input pin doglegs are possible. An input pin 
dogleg occurs when one connects more than one routing wire 
segment to the same logic block input pin. The input pin 
dogleg then, allows one to change tracks by connecting two 
wiring segments to a logic block input pin. Input pin doglegs 
are not allowed in the FPGA architecture to be used in the 
challenge; i.e. only one wire segment may be connected to a 
logic block input pin. 

However, the routing architecture above is not the routing 
architecture that is actually implemented by SRAM-based 
FPGAs. As shown below, commercial SRAM-based FPGAs 
normally place a buffer between routing tracks and the input 
pins to which they can connect to enhance speed. As well, to 
save area, the connection from routing wire segment to input 
pin is made via a multiplexer, not via a set of independent pass 
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transistors. Accordingly, it is not possible to connect two wire 
segments together via an input pin, and input pin doglegs are 
not possible. 




A 

Figure 5. More Realistic Architecture 



Note that it is possible to connect a logic block output pin 
to multiple wire segments in commercial FPGAs. Accordingly, 
such connections are allowed in the FPGA challenge. 
Applications of FPGAs include digital signal processing, 
software-defined radio, aerospace and defense systems, ASIC 
prototyping, medical imaging, computer vision, speech 
recognition, cryptography, bioinformatics, computer hardware 
emulation, radio astronomy, metal detection and a growing 
range of other areas. 

V. RESULTS 




Figure 9. Simulation Result 

B. SIMULA TION RESULTS OF PROPOSED DESIGN 




A. SIMULATION RESULTS OF EXISTING SYSTEM 





Figure 6. Simulation Result 



Figure 7. Simulation Result 





Figure 1 1 . Simulation Result 
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Figure 12. Simulation Result 
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Figure 8. Simulation Result 



Figure 1 3 . Simulation Result 



Figure 10. Simulation Result 
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VI. COMPARISON RESULTS 

To compare the circuit performance with three different 
architectures, 32-bit carry ripple adder, 32-bit carry select 
adder, and 32 -bit area-efficient carry select adder that is 
proposed in this paper. As for the transistor count, the transistor 
count of our proposed area-efficient carry select adder could be 
reduced to be very close to that of carry ripple adder; however, 
the transistor count in the conventional carry select adder is 
nearly double as compared with the proposed design. This 
result shows that sharing common Boolean logic term could 
indeed achieve a superior performance in aspect of transistor 
count. The area-efficient carry select adder can also achieve an 
outstanding performance in power consumption. Power 
consumption can be greatly saved in our proposed area- 
efficient carry select adder because we only need one XOR 
gate and one INV gate in each summation operation as well as 
one AND gate and one OR gate in each carry-out operation 
after logic simplification and sharing partial circuit. Because of 
hardware sharing, we can also significantly reduce the 
occurring chance of glitch. Besides, the improvement of power 
consumption can be more obvious as the input bit number 
increases. 
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Figure 14. The simulation results of the power consumption comparison in 
the proposed area-efficient carry select adder and the conventional carry select 

adder. 

We simulated the power consumption in the proposed area- 
efficient adder and the conventional carry select adder with 4, 
8, 16, and 32-bit respectively in tsmc 0.1 8um CMOS 
technology. Figures above shows the simulation results of the 
proposed area-efficient carry select adder and the conventional 
carry select adder. The power consumption difference between 
these two designs is small in the case of 4-bit input word 
length. Since the conventional carry select adder consists of the 
duplicated adder cells to prepare both the possible output 
values for the corresponding carry input values in advance. It 
not only needs larger hardware area, but also generates more 
glitch signals because of propagation path difference. 
Therefore, as the input bit number increases, the slope of power 
consumption increase in the conventional carry select adder 
would be larger than that in our proposed design. As the input 
bit number of the conventional carry select adder increases to 
32-bit, the power consumption in the conventional carry select 
adder will be 3.3 times larger than that in our proposed area- 
efficient carry select adder. 
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Figure 15. The simulation results of the computation speed comparison in the 
proposed area-efficient carry select adder and the conventional carry select 

adder. 

VII. CONCLUSION 

In this work, an area-efficient carry select adder is 
proposed. By sharing the common Boolean logic term, we can 
remove the duplicated adder cells in the conventional carry 
select adder. In this way, the transistor count in a 32-bitcarry 
select adder can be greatly reduced from 1947 to 960. 
Moreover, the power consumption can be reduced from 
1.26mw to 0.37mw as well as power delay product reduced 
from 2.14mw*ns to 1.28mw*ns. By retaining part of parallel 
architecture of conventional carry select adder, we can still 
maintain some competitiveness in speed. In this way, our area- 
efficient adder can perform with nearly the same transistor 
count, nearly the same power consumption, but with faster 
speed and lower PDP as compared with the carry ripple adder. 
This design is mainly used to reduce the area and as well as to 
increase the functioning speed. 
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